target - get gene specific tracks (bedgraph, wig, etc)

what would be nice is a bed with every CG, methylated and non-meth in sep files, with loci redundancy

Maybe not ideal but a start

---- 
files to start with

http://eagle.fish.washington.edu/bivalvia/Galaxy61-[all_CG_dinucleotides].tabular

http://eagle.fish.washington.edu/bivalvia/combinedCG_MBDgill_oyster.v9_5x.tabular


viz in igv http://eagle.fish.washington.edu/cnidarian/combinedCG_MBDgill_oysterv9_5x.igv



---

lets move to a individual gene basis

New Fasta from gene gff?

mRNA gff…..

http://aquacul4.fish.washington.edu/~steven/armina/oyster.v9.glean.final.rename.mRNA.gff



Tool:    bedtools getfasta (aka fastaFromBed)
Version: v2.17.0
Summary: Extract DNA sequences into a fasta file based on feature coordinates.

Usage:   bedtools getfasta [OPTIONS] -fi <fasta> -bed <bed/gff/vcf> -fo <fasta>

Options:
     -fi     Input FASTA file
     -bed     BED/GFF/VCF file of ranges to extract from -fi
     -fo     Output file (can be FASTA or TAB-delimited)
     -name     Use the name field for the FASTA header
     -split     given BED12 fmt., extract and concatenate the sequencesfrom the BED "blocks" (e.g., exons)
     -tab     Write output in TAB delimited format.
          - Default is FASTA format.

     -s     Force strandedness. If the feature occupies the antisense,
          strand, the sequence will be reverse complemented.
          - By default, strand information is ignored.

robertsmac:bin sr320$ ./fastaFromBed -fi /Volumes/web/cnidarian/oyster.v9.fa -bed /Users/sr320/Desktop/oyster.v9.glean.final.rename.mRNA.gff -fo /Volumes/web/cnidarian/TJGR_genomic_gene.fasta


PLEASE.gff
./fastaFromBed -name -fi /Volumes/web/cnidarian/oyster.v9.fa -bed /Users/sr320/Desktop/PLEASE.gff -fo /Volumes/web/cnidarian/TJGR_genomic_gene.fasta



http://eagle.fish.washington.edu/cnidarian/TJGR_genomic_gene.fasta

Now lets run BSMAP just on the this 


./bsmap -a //Volumes/NGS\ Drive/NGS\ Raw\ Data/Oyster_BSseq/filtered_Unlabeled_NoIndex_L003_R1.fastq.gz -d /Volumes/web/cnidarian/TJGR_genomic_gene.fasta -o /Volumes/web/cnidarian/BiGill_BSMAP_GillMBD_genomic_gene_v1.sam -p 8



Input reference file: /Volumes/web-1/cnidarian/TJGR_genomic_gene.fa      (format: FASTA)
Load in 28027 db seqs, 
total_kmers: 43046721
Create seed table. 26 secs passed
max number of mismatches: read_length * 8%      max gap size: 0
kmer cut-off ratio:5e-07
max multi-hits: 100     max Ns: 5     seed size: 16     index interval: 4
quality cutoff: 0     base quality char: '!'
min fragment size:28     max fragemt size:500
start from read #1     end at read #4294967295
additional alignment: T in reads => C in reference
mapping strand: ++,-+
Single-end alignment(8 threads)
Input read file: //Volumes/NGS Drive/NGS Raw Data/Oyster_BSseq/filtered_Unlabeled_NoIndex_L003_R1.fastq.gz      (format: gzipped FASTQ)
Output file: /Volumes/web-1/cnidarian/BiGill_BSMAP_GillMBD_genomic_gene_v1.sam      (format: SAM) size 226602635 bp. 8 secs passed


Total number of aligned reads: 93661017 (67%)
Done.
Finished at Fri May  3 03:10:08 2013
Total time consumed:  51264 secs

http://eagle.fish.washington.edu/cnidarian/BiGill_BSMAP_GillMBD_genomic_gene_v1.sam


methratio

python methratio.py -d /Volumes/web-1/cnidarian/TJGR_genomic_gene.fa -u -z -g -o  /Volumes/web-1/cnidarian/BiGill_methratio_gene_geno_A.txt -s /Volumes/Bay3/Software/BSMAP/bsmap-2.73/samtools /Volumes/web-1/cnidarian/BiGill_BSMAP_GillMBD_genomic_gene_v1.sam

@ Fri May  3 06:33:51 2013: reading reference /Volumes/web-1/cnidarian/TJGR_genomic_gene.fa ...
@ Fri May  3 06:34:05 2013: reading /Volumes/web-1/cnidarian/BiGill_BSMAP_GillMBD_genomic_gene_v1.sam ...
[samopen] SAM header is present: 28027 sequences.

running on hummingbird..


[sam_read1] reference 'ZS:Z:-+' is recognized as '*'.
Parse error at line 59068474: invalid CIGAR character
@ Fri May  3 07:00:45 2013: combining CpG methylation from both strands ...
@ Fri May  3 07:01:02 2013: writing /Volumes/web-1/cnidarian/BiGill_methratio_gene_geno_A.txt ...
@ Fri May  3 07:13:35 2013: done.
total 35267328 valid mappings, 42064194 covered cytosines, average coverage: 5.02 fold.



Getting error in methratio error so lets try bsmap with BAM output

./bsmap -a /Volumes/NGS\ Drive/NGS\ Raw\ Data/Oyster_BSseq/filtered_Unlabeled_NoIndex_L003_R1.fastq.gz -d /Volumes/web-1/cnidarian/TJGR_genomic_gene.fasta -o /Volumes/web-1/cnidarian/BiGill_BSMAP_GillMBD_genomic_gene_v2.bam -p 10


Input reference file: /Volumes/web-1/cnidarian/TJGR_genomic_gene.fasta      (format: FASTA)
Load in 28027 db seqs, total size 226602635 bp. 7 secs passed
total_kmers: 43046721
Create seed table. 17 secs passed
max number of mismatches: read_length * 8%      max gap size: 0
kmer cut-off ratio:5e-07
max multi-hits: 100     max Ns: 5     seed size: 16     index interval: 4
quality cutoff: 0     base quality char: '!'
min fragment size:28     max fragemt size:500
start from read #1     end at read #4294967295
additional alignment: T in reads => C in reference
mapping strand: ++,-+
Single-end alignment(10 threads)
Input read file: /Volumes/NGS Drive/NGS Raw Data/Oyster_BSseq/filtered_Unlabeled_NoIndex_L003_R1.fastq.gz      (format: gzipped FASTQ)
Output file: /Volumes/web-1/cnidarian/BiGill_BSMAP_GillMBD_genomic_gene_v2.bam      (format: SAM, automatically convert to BAM)

-hummingbird

Total number of aligned reads: 93661017 (67%)
Done.
Finished at Fri May  3 15:07:33 2013
Total time consumed:  24661 secs
sh: sam2bam.sh: command not found

http://eagle.fish.washington.edu/cnidarian/BiGill_BSMAP_GillMBD_genomic_gene_v2.bam


---
try the BSP format
hummingbird


d-128-95-149-219:bsmap-2.73 sr320$ ./bsmap -a /Volumes/NGS\ Drive/NGS\ Raw\ Data/Oyster_BSseq/filtered_Unlabeled_NoIndex_L003_R1.fastq.gz -d /Volumes/web-1/cnidarian/TJGR_genomic_gene.fasta -o /Volumes/web-1/cnidarian/BiGill_BSMAP_GillMBD_genomic_gene_v3.BSP -p 3


RUNNING

--

ALSO lets try SAM with uncompressed 

./bsmap -a /Volumes/NGS\ Drive/NGS\ Raw\ Data/Oyster_BSseq/filtered_Unlabeled_NoIndex_L003_R1.fastq -d /Volumes/web-1/cnidarian/TJGR_genomic_gene.fasta -o /Volumes/web-1/cnidarian/BiGill_BSMAP_GillMBD_genomic_gene_v4.sam -p 6
---


now methratio on each….


python methratio.py -d /Volumes/web-1/cnidarian/TJGR_genomic_gene.fa -u -z -g -o  /Volumes/web-1/cnidarian/BiGill_methratio_gene_geno_C.txt -s /Volumes/Bay3/Software/BSMAP/bsmap-2.73/samtools /Volumes/web-1/cnidarian/BiGill_BSMAP_GillMBD_genomic_gene_v3.BSP



d-128-95-149-219:bsmap-2.73 sr320$ python methratio.py -d /Volumes/web-1/cnidarian/TJGR_genomic_gene.fa -u -z -g -o  /Volumes/web-1/cnidarian/BiGill_methratio_gene_geno_C.txt -s /Volumes/Bay3/Software/BSMAP/bsmap-2.73/samtools /Volumes/web-1/cnidarian/BiGill_BSMAP_GillMBD_genomic_gene_v3.BSP
@ Sun May  5 06:49:42 2013: reading reference /Volumes/web-1/cnidarian/TJGR_genomic_gene.fa ...
@ Sun May  5 06:49:56 2013: reading /Volumes/web-1/cnidarian/BiGill_BSMAP_GillMBD_genomic_gene_v3.BSP ...
     @ Sun May  5 06:54:44 2013: read 10000000 lines
     @ Sun May  5 06:59:33 2013: read 20000000 lines
     @ Sun May  5 07:04:21 2013: read 30000000 lines
     @ Sun May  5 07:09:09 2013: read 40000000 lines
     @ Sun May  5 07:13:59 2013: read 50000000 lines
     @ Sun May  5 07:18:47 2013: read 60000000 lines
     @ Sun May  5 07:23:39 2013: read 70000000 lines
     @ Sun May  5 07:28:33 2013: read 80000000 lines
     @ Sun May  5 07:33:25 2013: read 90000000 lines
@ Sun May  5 07:35:13 2013: combining CpG methylation from both strands ...
@ Sun May  5 07:35:31 2013: writing /Volumes/web-1/cnidarian/BiGill_methratio_gene_geno_C.txt ...
@ Sun May  5 07:50:00 2013: done.
total 56211463 valid mappings, 48430264 covered cytosines, average coverage: 6.92 fold.
d-128-95-149-219:bsmap-2.73 sr320$




python methratio.py -d /Volumes/web-1/cnidarian/TJGR_genomic_gene.fa -u -z -g -o  /Volumes/web-1/cnidarian/BiGill_methratio_gene_geno_D.txt -s /Volumes/Bay3/Software/BSMAP/bsmap-2.73/samtools /Volumes/web-1/cnidarian/BiGill_BSMAP_GillMBD_genomic_gene_v4.sam


d-128-95-149-219:bsmap-2.73 sr320$ python methratio.py -d /Volumes/web-1/cnidarian/TJGR_genomic_gene.fa -u -z -g -o  /Volumes/web-1/cnidarian/BiGill_methratio_gene_geno_D.txt -s /Volumes/Bay3/Software/BSMAP/bsmap-2.73/samtools /Volumes/web-1/cnidarian/BiGill_BSMAP_GillMBD_genomic_gene_v4.sam
@ Sun May  5 06:50:32 2013: reading reference /Volumes/web-1/cnidarian/TJGR_genomic_gene.fa ...
@ Sun May  5 06:50:44 2013: reading /Volumes/web-1/cnidarian/BiGill_BSMAP_GillMBD_genomic_gene_v4.sam ...
[samopen] SAM header is present: 28027 sequences.
     @ Sun May  5 06:55:21 2013: read 10000000 lines
     @ Sun May  5 06:59:57 2013: read 20000000 lines
     @ Sun May  5 07:04:33 2013: read 30000000 lines
     @ Sun May  5 07:09:09 2013: read 40000000 lines
     @ Sun May  5 07:13:47 2013: read 50000000 lines
     @ Sun May  5 07:18:24 2013: read 60000000 lines
     @ Sun May  5 07:23:04 2013: read 70000000 lines
     @ Sun May  5 07:27:45 2013: read 80000000 lines
     @ Sun May  5 07:32:26 2013: read 90000000 lines
@ Sun May  5 07:34:09 2013: combining CpG methylation from both strands ...
@ Sun May  5 07:34:27 2013: writing /Volumes/web-1/cnidarian/BiGill_methratio_gene_geno_D.txt ...
@ Sun May  5 07:49:08 2013: done.
total 56211459 valid mappings, 48430256 covered cytosines, average coverage: 6.92 fold.
d-128-95-149-219:bsmap-2.73 sr320$